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If the conditional information of a classical probability distribution of three random vari- 
ables is zero, then it obeys a Markov chain condition. If the conditional information is close 
to zero, then it is known that the distance (minimum relative entropy) of the distribution 
to the nearest Markov chain distribution is precisely the conditional information. We prove 
here that this simple situation does not obtain for quantum conditional information. We 
show that for tri-partite quantum states the quantum conditional information is always a 
\ lower bound for the minimum relative entropy distance to a quantum Markov chain state, 

but the distance can be much greater; indeed the two quantities can be of different asymp- 
totic order and may even differ by a dimensional factor. 
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t-h ' I. INTRODUCTION 

> : 

^ , From the point of view of information theory, as well as physics, it is very interesting to know 

when entropy or, more generally, information inequalities are saturated. For example, the basic 
quantities von Neumann entropy S(A) = S(pa) = — Tr PA log pa, quantum mutual information 
I {A : B) = S(A) + S(B) — S(AB) for a bipartite state pab and conditional mutual information 
I (A :C\B) = S{AB) + S{BC) - S(B) - S(ABC) for a tripartite state PA bc are all non-negative; 
for the latter two this is known as the subadditivity and strong subadditivity of the entropy, re- 
spectively The entropy is if and only if the state is pure, and the mutual information is if 

■£3 \ and only if the state pab is a product state, pab = Pa® Pb- 

However, in many applications it is not the case or not known that the state is exactly pure or 
a product, only that it is very close to being so. In such situations, there are continuity bounds on 
entropic quantities that one can use to quantify how small the entropy or mutual information is. 
Fannes' inequality [11] states that if \\pa — &a\\i < e < 1/e (with the trace norm := Tr \X\ = 

j><j \ Tr y/X*X), then 



\S{p)-S(c)\ < -eloge + elog<U, (1) 

where d>A is the dimension of the Hilbert space supporting the states, ("log" in this paper is always 
the binary logarithm; the natural logarithm is denoted "In".) In particular, if p has trace distance 
e < 1/e to a pure state, then S(p) < — eloge + elogaU- Recently, Alicki and Fannes [2] proved an 
extension of the Fannes inequality to quantum conditional entropy S(A\B) = S(AB) — S(B) for 
bipartite states pab and gab'- if \\pA — &a\\i < e < 1/ then 

\S{A\B) p -S(A\B) a \ < -2eloge-2(l-e)log(l-e) + 4 e logcU- (2) 

The crucial observation here is that the bound only depends on e and <La, not ds as the bound 
yielded by a naive application of the original Fannes inequality. This gives an upper bound on 
the mutual information for a state that is at trace distance e from a product state (using convexity 
of the trace distance, and (Q} and |(2]) together with the triangle inequality). 

Conversely, one may ask, if say the entropy of a state is small, S(p) < e, is it close to being 
pure? Indeed yes, as the following argument shows. Fix a diagonalisation of p, p = Ya=i ^i\ e i)( e i\ 
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with eigenvalues A, arranged in decreasing order. Then, as — xlogx > x for < x < 1/2, 

d A d A 

e>5(p)=^-AaogA i >^A, = l-A 1 . (3) 

i=l i=2 

Hence, 

||p-|ei)(ei|||i = 2(l-A 1 )<2 e . (4) 

Note however that this bound and Fannes' inequality are not "inverse" to each other; plugging 
the 2e into the Fannes bound yields something much larger than order e. 

Similarly, what can we say about the state when I (A : B) < e? Here, a new quan- 
tity, the relative entropy D(p\\a) = Trp(logp — logo"), comes into play, when we observe that 
I (A : B) p = D(pab\\pa ® Pb)- Invoking another inequality between distance measures for states, 
namely Pinsker's inequality, see | jl2j ]. 

D ^^(^P-^)^ (5) 

we conclude that \\pab — PA ® Pb\\i < 2y/e. Note that in both examples discussed, we found an 
explicit candidate for the closest pure/ product state to the given state (as can be checked), and that 
the bound on the trace distance depends only on e, not on dimensions as in the converse Fannes- 
style inequalities. Third, that the relative entropy gives even tighter control on the distance due 
to Pinsker's inequality. 

In this paper we study the quantum conditional information. If the quantum conditional in- 
formation of a tri-partite state p vanishes, then p obeys a quantum Markov chain condition. Here 
we analyze what can be said if p has small quantum conditional information; in particular we 
investigate how close it is to a Markov chain state. The motivation is partly classical (e.g. cryp- 
tographic J3]), but in the quantum case a strong motivation comes from considerations of new 
entropy inequalities: in [18] a so-called constrained inequality for the von Neumann entropies 
of subsystems was found, namely a relation which is valid provided three quantum conditional 
mutual informations are zero. The desire to turn this into an unconstrained, universal inequality 
lead to speculations that if one understood the near-vanishing of these constraints, then perhaps 
a trade-off between the constraints and the new inequality solely in terms of entropies might be 
established. 

In section [II] we review, as a model, the classical case, where it turns out that the conditional 
mutual information is exactly the minimum relative entropy distance between the distribution 
and the closest Markov chain distribution. In section [III] we formulate the analogous quantum 
problem, which we analyse in the rest of the paper: section [IV] presents several simplifications of 
the question - we prove continuity of the minimum relative entropy, and that it is lower bounded 
by the quantum conditional mutual information, and some useful formulas for later numerical 
and analytical evaluation of the quantity. Then, in section [V] we specialise to pure states: we 
relate the minimum relative entropy to the so-called entanglement of purification, and for a large 
family of states show upper and lower bounds of matching order. These results are then used in 
section [VI] to provide examples of states for which the minimum relative entropy is much larger 
than the quantum conditional mutual information, and also ones where the dimension enters 
explicitly, showing that the classical and the quantum case are very different indeed. 
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II. CLASSICAL CASE 

In the classical case, Markov chain distributions are used to define the conditional mutual 
information. A classical distribution Px 1 x 2 ...X n ( x i> x 2> ■ ■ • > x n) form a Markov chain denoted as 
X\ — ► X2 — > X3 —►...—> X n if the distribution can be written as 

Px 1 x 2 ...x n (x 1 ,x 2 , . . . ,x 3 ) = P Xl x 2 (xi,x 2 )Px 3 \x 2 {x3\x2) ■ ■ • -Px„|X n _i { x n |a?n-i) (6) 

If we take any three linked variables in the above Markov chain i.e. X a -i,X a ,X a+ i then the 
conditional distribution of Px a+1 \x ct ...x 1 { x a+i\x a ■ ■ ■ x\) depends only on X a , and X a+ i is con- 
ditionally independent of X a -x, given X a . Consider three random variables X, Y and Z which 
form a Markov chain X — > Y —> Z. The probability distribution for this system is 



PxYz(xyz) = P X Y{xy)P Z \ Y {z\y) 

= P Y (y)Px\Y{x\y)P Z \Y(z\y). 

Aside, we define the conditional mutual information as, 



(7) 



I(X : Z\Y) = PxYz(xyz) log - ^ Z YK W 

^ Px\Y{x\y)P z \Y{z\y) 



(8) 



x,y,z 



Note that throughout this section we use the convention log = 0. (This is justified by looking 
at the behavior of xlogx as x —> 0.) This conditional mutual information is equal to zero if and 
only if for all x, y and z, 

Pxz\Y{xz\y) ^ 



Px\Y(x\y)Pz\Y(z\y) 
Therefore, 

Pxz\Y(xz\y) = P x \ Y {x\y)P z \ Y {z\y) 
PxYz(xyz) = P Y {y)P x \ Y {x\y)Pz\Y{Ay)- 

Hence a classical Markov chain distribution is characterized by zero conditional mutual infor- 
mation. The classical case is characterized by an exact correspondence between the conditional 
mutual information and the relative entropy distance to the set of Markov chains: for any joint 
distribution P XY z of three random variables X, Y, Z |14l| , 

I(X : Z\Y) = mhx{D{P\\Q) : Q Markov}. (11) 
It can be shown that the Markov chain required to minimise this quantity is 

QxYz(xyz) = P Y (y)P x \ Y (x\y)P z \ x (z\x). (12) 
Proof Imagine a joint probability distribution Q XY z that also forms a general Markov chain: 

QxYzixyz) = Q Y (y)Qx\Y{x\y)Q z \ Y (z\y). (13) 
We can write the probability distribution of Pxyz as follows 

PxYz(xyz) = P XYZ (xyz) = P Y {y)P z \ Y {z\y)P x \ YZ {x\yz), (14) 
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therefore the relative entropy between the two probability distributions is 

nfu\\n\ d / n i PY(y)Pz\Y(z\y)Px\Yz(x\yz) 

D{P\\Q) = I^Pxyz {xyz log 1 n — -! n — . (15) 

~ QY{y)Qz\Y{z\y)Qx\Y(x\y) 

Since we have a product of logarithms we can represent the relative entropy as such, 



P>{P\\Q) = V -Pxyz(xyz) log — — - + log ' + log —J — n — 

On inspection of the final term we can use the following equivalence 

Px\yz(x\yz) P X Yz{xyz) _ Pz\xv{z\xy) P x \ Y {x\y) 



(16) 



(17) 



Qx\Y(x\y) P Y z{yz)Q X \Y{x\y) P Z \ Y {z\y) Q x \ Y {x\y) 
Observing that the first two terms of eq. ((l~6"|) are relative entropy terms, we have 

D(P\\Q)=D(P Y (y)\\Q Y (y))+D(P zlY (z\y)\\Q zlY (z\y)) 

+ D(P xlY (x\y)\\Q xlY (x\y)) + ^ PxYz(xyz) log Pz \ XY ^ x f , < 18 ) 



xyz 



Note that the only terms that depend on the distribution of Q are the first three relative entropy 
terms. Since relative entropy is non-negative and D(S\\T) = if and only if S = T, the Markov 
chain that provides the minimum relative entropy between P and Q can achieved by setting these 
terms to zero which gives the required distribution in ((T2|) . This concludes the proof. □ 
From this result it is simple to show that conditional mutual information can be achieved. Since 
we know the relative entropy terms in eq. ((18)) are zero if we use Q as the minimising Markov 
chain: 

D(P\\Q) = £ PxYz(xyz) log Pz \ XY ^ x y\ (19) 

p z\Y(z\y) 

Using the following equivalence 

d i \ \ p XYz{xyz) _ PxYz(xyz) P Y (y) Pxz\y{xz\ V ) 

Pz|Xy(zM " PxY(xy) ~ Py(y) PxY(xy) ~ P m {x\y) " (20) 



We can substitute this into ([19]) to produce the final result 

\ Px7iv(xz\y) 
D(P\\Q) = PxYzixyz) log - * Z \ I S = ^ X : 

^ Px\Y\x\y)P z \ Y {z\y) 



(21) 



xyz 



III. QUANTUM ANALOGUE 

A quantum analogue of (short) Markov chains, i.e. quantum states of some tripartite system 
ABC with a suitably defined Markov property, was first proposed by Accardi and Frigerio [lj]. In 
finite Hilbert space dimension, which will be the case we will consider in the present paper, this 
property reads as follows: habc is a quantum Markov state if there exists a quantum channel, 
i.e. a completely positive and trace preserving (c.p.t.p.) map T : B(B) — > B(B) <g> B{C) such that 



5 



Pabc = (icU <8> T)hab> with = Trc pabc- h" 1 llSD it was shown that this Markov condition is 
equivalent to vanishing conditional mutual information, 

I(A : C\B)fj, = 0, (22) 

just as in the classical case, and in 113] the most general form of such states was given, as follows: 
system B has a direct sum decomposition into tensor products, 

B = 0&f®&f, (23) 

3 

such that 

PABC = PjV>% ® M$ c - (24) 

Note that this precisely generalises eq. (0. We introduce the notation 5 for the direct sum decom- 
position of B. Note that we can always think of He as being a subspace of a larger Hilbert space 
TLg (for which inclusion we use the shorthand B ^ B). This doesn't change the fact that a state is 
a Markov chain state or not, but it leads to more possibilities of decomposing the ambient Hilbert 
space as a sum of products as in eq. (|23)) . In other words, in a larger space there is a larger set of 
quantum Markov chains. This latter is evidently going to be relevant when comparing a given 
state pabc to the class of Markov chain states: in general we will have to admit that all three sys- 
tems A, B, C are subspaces of larger Hilbert spaces, and we have to take into account the Markov 
states on the extended system. 

Now we go on to develop some formalism to deal with these embeddings: If we have the 

embedded system B <^-> B then we define 5 = (^B <— > B = ®^ B^j as both the isometric 
embedding and the orthogonal decomposition of the embedding system. For a specific such direct 
sum decomposition 5 we introduce r = t§ for the family of tensor product decompositions of the 

Bj, which are, w.l.o.g., embeddings t$ = (j-j : Bj ^ <g) b^j . Note that this latter only gives 

us increased flexibility: we could as well demand that each tj is actually a unitary isomorphism 
between Bj and bj <S> b^, because one can always blow up the spaces Bj, extending the isometry 
to a unitary. 

This brings us to the main question of this paper: for given state pabc, to find 

A(p) := mi{D(p\\fi) : p Markov}, (25) 

and to compare it to I (A : C\B) p . The remainder of this paper will be devoted to a study of the 
properties of this function. To be precise, we would like to consider p to be a Markov state on a 
tripartite system ABC, with A C A, B c B and C C C (with p understood to be also a state on 
ABC via these embedding), which is why above we have to use the infimum, since the dimension 
of ABC is unbounded. This appears to be necessary for the reason that the decompositions as in 
eq. (|24)l depend on the dimension of B. 

We will show below (in the next section) that w.l.o.g. A = A and C = C, and dim B < dg, 
so that the infimum is actually a minimum. We also show lower bounds on A comparing it to 
I {A : C\B), in particular exhibiting examples of states p with A(p) 3> I(A : C\B) p . 



IV. GENERAL PROPERTIES OF A 

Here we show that the problem of determining the minimum relative entropy to a Markov 
state is really only a minimisation over decompositions of the type (|23)) for B. For given di- 
mensions of the quantum system, there is only a finite number number of decomposition types. 
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Therefore we need to perform a finite-dimensional optimisation for each decomposition (some of 
which we can perform explicitly) and choose the global minimum. 

Proposition 1 The optimal state for given direct sum and tensor decomposition we denote uj[5, r] describ- 
ing the specific direct sum as 5 and the chosen tensor decomposition for that direct sum r = t$. We obtain 
u}[5, t] by the following procedure: first, with the subspace projections Pj onto bj (g> b $ C B, let 

u[S] := 0(2 A c ® P 3 )p(t AC ® = <l^%,, /( .. (26) 

3 j 

where for each part j of the direct sum for the given decomposition, we project system B via the correspond- 

(i) 

ing projections Pj to produce u l' L R with corresponding probability qj. Then, form the reduced states 

j j 

u % = Tr bfc "% bRc and x $ c = Tr Mf ^% h n c , and let 

3 J 3 3 3 J 3 3 

j 

With these definitions, it is easy to work out that 

D( p \\u[6,t)) =-S(p) + H(q) + Y / Q j (s{a ( ll f )+S{x^ c )). (28) 

3 3 3 

Then, among all Markov states with decomposition d23l of B, uj [8, r] is the one with smallest relative 
entropy, given by eq. d28l >. 

Proof The relative entropy between a general state pabc and a general quantum Markov state 
Pabc is, with given decompositions 5 and t, 

D (pabc 1 1 Pabc ) = - S (pabc ) - Tr(p A BC log pabc ) , (29) 
where the general quantum Markov state is defined as 

PABC = PjP^l ® P^ (30) 

3 3 3 

Therefore we can calculate the logarithm of pabc/ 

log pabc = ( log Pj (t A c ® Pj) + log(/i^ ® V$ c )) =■ L 3 , (31) 

j 3 3 j 

where Pj are the subspace projections from B onto bjbf 1 . For clarity we assume p indicates the 
state over all parties unless otherwise indicated. 



Tr pabc log PABC = ^2 Tr ([t A C &> Pj)p(^AC ® Pj)Lj^j . 



(32) 
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Now (TUc ® ^,)/o(1ac ®Pj) = ^% b R C with Tr w 2w c = L Therefore ' 



Tr p ABC log /uabc = Yl Tr V^AbWc lo § Pi C 1 ^ P i ) + E Tr 4i u AbHXc lo ^% ® (33) 



= 2^ log Pi + ^ Tr ufyqc log(M^ ® (34) 
i i 

= -#(g) " f(g||£) 

" E * f 5 ^) + S (X%) + 11^) + D(X$ C \\$ C )) , (35) 

\ 3 3 3 3 3 3/ 

3 

where a { ^ L = Tr 6 « c ^i 6 it c and x^ c = Tr A6 z, ^i b jt c - For a given decomposition of system B, 



3 3 



the subspace projections Pj and hence qj and w ^£ 6 ii c are fixed. Since we want to minimise the 



relative entropy we want to maximise the quantity Tr pabc log Pabc- Therefore to maximise the 
first relative entropy term we set pj = qj. For the sum we consider each % individually and only 
have freedom of setting the last two relative entropy terms to zero. Therefore we can maximise 
this expression by setting fJ^t = a Ab L an< ^ ^bff-c = ^Sc concludes the proof. □ 
A nice observation is that the relative entropy of interest can be decomposed into two relative 
entropies, as follows: 

D(p\\u[S,T]) =D(p\\u[6\)+D(u[8\\\u[6,t]) 

= D{p\\u[S\) +J2<liI(Abj : bf CU> (36) 
j 

Note that this result has an important consequence for the infimum defining A(p): we only need 
to worry about embedding B into a larger system B; A and C can, w.l.o.g., stay the same. 

Theorem 2 The infimum of eq. d25l > is achieved on a decomposition B = 0^"^ b^ ® b^, with 
dim bj , dim b^ < ds- In particular, because it is one of a continuous function over a compact domain, 
the infimum is actually a minimum. Also, it means that A(p), as the minimum of a continuous function 
over a compact domain, is itself a continuous function of its argument p. 

The reader may wish to skip the rather lengthy and somewhat technical proof of this theorem; 
note however that in it some notation is introduced which is referred to later. 

Proof The proof has two parts - first, that the direct sum decomposition 5 may be taken to have 
only d? B terms, and second, that each direct summand may be embedded into a space of not more 
than ds x ds dimensions. These two arguments are quite independent of each other; we start 
with the first. 

1. For given embedding B <— » B and decomposition of B, we have 

w[<5] = 0(lAC ® P^PABcitAc ® Pj) = 0(lAC ® PjP)pABc(t A C ® PPj), (37) 

3 3 

with the projector P of B onto B. Note that the operators PjP form a complete Kraus system: 

^2(PjP)\PjP) =J2 PP J P = p = t B. (38) 
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Hence the operators Mj = PPjP form a POVM on B, and introducing an auxiliary register J 
with orthogonal states \ j) to reflect the direct sum, tj[S\ is equivalent, up to local isometries, to the 
state 

Q = ^2(t AC ^M])p ABC (tAC ® y/Mj) ® b'Xilj (39) 

3 

At the same time, the embedding tj can be reinterpreted as a family of isometries Tj : B <— ► 6 £ ' ® 6 R 
controlled by the content j of the J-register (note that we may, w.l.o.g., assume that the Tj all map 
into the same tensor product space), so that the state after the action of r is 

^AbH^CJ = J2( 1ac T jx /M^) P ABc{^AC ® \fM~jr}) ® (40) 
i 

In this notation, our formula ([28]) can be rewritten as 

D(p|M*,r]) = -5(p) + S(J) + ^6 L |J) + S(6 R C|J). (41) 

Now, to reduce the number of POVM elements (i.e., entries of the J-register with non-zero prob- 
ability amplitude), we invoke a theorem of Davies (lOfl on extremal POVMs: One looks at all real 
vectors (Xj)j such that the operators XjMj form a POVM, i.e., J2j XjMj = 1b- It is clear that the 
all-ones vector is eligible, and that this set is compact and convex - in fact, it is a polytope, and 
Davies' theorem states that its extremal points have at most d 2 B non-zero entries (actually, this 
is just a special case of Caratheodory's lemma). On the other hand, the all-ones vector can be 
convex-decomposed into extremal ones, i.e., 

Vj M J = Y,n\f ) M J , (42) 

k 

with extremal vectors (Xj°^)j and positive reals with Ylk r k = !• m operational terms, the 
POVM (Mj ) is equivalent to choosing K = k with probability r& and then measuring the POVM 

{\f ] Mj). This means that we can extend the state £1 above to 

^AbH^CJK = J2 rk ( tA c ® r j \Jxf ) M j )p ABC {t AC » yJxfMjr}) ® \k)(k\ K , (43) 

jk 

of which it can be readily verified that tracing over K gives eq. (|40]> . Then, by the concavity of the 
von Neumann entropy, S(J) > S(J\K) and by the way we constructed the POVMs, 

S(Ab L \J) = S(Ab L \JK), S(b R C\J) = S(b R C\JK). (44) 

Hence, eq. 01]) is lower bounded by 

- S(p) + S(J\K) + S(Ab L \JK) + S(b R C\JK), (45) 

and there exists a value k of K for which 

D(p\\u[S,r}) > -S(p) + S(J\K = k) + S(Ab L \JK = k) + S(b R C\J K = k). (46) 

But for each K = k, the right hand side is a relative entropy with a Markov state referring to 
the POVM (X^Mj); it can be lifted, by Naimark's theorem, to an orthogonal measurement on a 
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larger space B. It is clear that w.l.o.g. Bj has dimension at most ds- the state 0Jjj[' B . o is supported 
in Bj on a subspace of dimension at most ds- 

2. Now for the second part: looking at eq. (|28|) , we see that once 5 is fixed, we have states co ABjC 
and we need to find, for each j individually, a decomposition/ embedding Tj : Bj <^-* bj (£> b R that 
minimises the term S(a^K) + S(x9r c ) in eq. l|28)l . Dropping the index j for now, since we will 

3 3 

keep it fixed, let us introduce a purification |</>)abct> of lvabc', then, with the isometric embedding 
t : B ^ b L b R implicit and the slight abuse of notation 

\4>)AbH R CD ■= (^-acd <8> t)\(/)) AB cd, (47) 

our task is to minimise, over all choices of r, 

S(Ab L ) + S(b R C) = S(Ab L ) + S{ADb L ). (48) 

Now notice that the latter quantity refers only to subsystems AD and b L , and that hence we can 
describe it entirely by the state Trc ^abcd ='■ ^abd and the completely positive and trace pre- 
serving map T := Tr b n or mapping density operators on B to density operators on b L - by Stine- 
spring's theorem, conversely every such quantum channel can be lifted to an isometric dilation 
r : B b L ®b R (the system b R would be called the environment of the channel). For fixed output 
system b L the set of these quantum channels is convex and the state (Had ® T){)abd is a linear 
function of the map. Hence, by the concavity of the von Neumann entropy S, the smallest sum 
of entropies (|48)l is attained for extremal channels, which by a theorem of Choi [6] have at most 
ds operator terms in the Kraus decomposition - which translates into a dimension of at most ds 
of b R . The dimensionality of the subsystem covered by the output of that channel in b L is thus at 
most d 2 B . But now we can run the same argument for b R instead - the whole setup is symmetric, 
so the channel from B to b R is also w.l.o.g. extremal, entailing dim&£ < ds (note that we fix the 
output dimension here to < ds from the previous argument). □ 
There is a special case of the second part of the above proof in the literature that has inspired the 
present argument: that is the dimension bounds in the so-called entanglement of purification [20]. 
There it was shown that in the problem of, for a pure state u>abc> minimising the entropy 

S(AE) = ±(S(AE) + S(CF)), (49) 

over all isometric embeddings B > EF, one may restrict to a priori bounded dimensions dim E = 
ds and dimF = d 2 B , or, vice versa, dim I? = d 2 B and dimF = ds- What is noticed above is that, 
apart from the generalisation to mixed states, one can apply the argument of the extremal channels 
twice, to get the same bound ds on the dimensions of both E and F: 

Corollary 3 The entanglement of purification, 

E p (pac) = in] :S(AE), (50) 

the entropy understood with respect to the state 4>aefc, is attained at an embedding with dimensions 
dim E, dim F < ds = rank pac- n 

Theorem 4 For any state pabc, the quantity A(pabc) has the following lower bound: 



A(p) > I(A : C\B) p . 



(51) 
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Proof Indeed, it is sufficient to show, for any p A pc and decomposition of B as in eq. (12) with 
accompanying state u[S, r], that 

D(p\\u[S,r}) >I(A:C\B) p , (52) 

which, by eq. (17), is equivalent to 

H{g) + viS^Hp + S (xf fc )) > S(B) p + S{A\B) p + S(C\B) p . (53) 

3 

It turns out to be convenient to introduce the following state of five registers to represent the 
entropic quantities in the above: 



observing that we may think of all bj, b R as subspaces of one b L , b R , respectively. Then the 
inequality we need to prove reads 

S(J) n + S(Ab L \J) n + S(b R C\J) Q > S(B) P + S(A\B) P + S(C\B) p . (55) 

This is done by invoking standard inequalities as follows: 

S(J)a + S(Ab L \J) n + S(b R C\J) n = S(J) a + S(A\b L J) Q + S{b L \ J) a + S(C\b R J) n + S(b R \ J) n 

> S(J) n + S(b L b R \ J) n + J) n + S{C\b R J) n 
= S(Jb L b R ) n + S{A\b L J) Q + S'(C|6 R J)o 

> S(B) p + S(A\B) p + S(C\B) p , 

(56) 

where in the second line we have used ordinary subadditivity of entropy, and in the fourth line 
the fact that S7 is obtained from p by a unital c.p.t.p. map on B; it can only increase the entropy, 
and, since it induces c.p.t.p. maps from B to Jb L and Jb R , we can use the non-decrease of the 
conditional entropy under processing of the condition (that's basically strong subadditivity). □ 

That means, for given dimensions d,A, ds, dc, we may define the continuous and monotonic 
real function 

A(t;d A ,d B ,d c ) := m a x{A(p ABC ) : I (A : C\B) p < t}, (57) 

which has the property A(t; d A , ds,dc) = if and only if t = and A(i; d A , ds,dc) > t (for not 
too large t, i.e. t < 2 log min {d A , dc})- 

V. PURE STATES 

Here we give some results when p is a pure state. The entropy of the density matrix of a pure 
state is zero, and the minimum over Tj of the j th von Neumann entropy term in the sum in eq. 



is the entanglement of purification of p^ AC , 120(1 . Thus, we arrive at the formula 



mmDty\\u[6, r}) = H(q) + 2 £ q 3 E P (p%) . (58) 

3 

Using the calculation of Ep for symmetric and antisymmetric states in |@|, we can now show: 
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Theorem 5 Let A and C be systems of the same dimension d. For any pure state ipABC such that pac = 
Trs ipABC is supported either on the symmetric or on the antisymmetric subspace of AC, we have 

S{ PA ) < A(V) < 2S(p A ). (59) 

Proof The upper bound can be simply derived by considering a single decomposition of system 
B and calculating the value of -D^H^f^, r]). Since A(p) is a minimum over all possible decompo- 
sitions of system B, choosing one will immediately give an upper bound. Consider the following 
decomposition, 

B = b L ®b R := B®C. (60) 
This gives a single term of tensor products leading to the following density matrix 

u[S,t] = pab ® pc, (61) 

therefore we have, 

D{^\\u[5,t])=2E p (pac)- (62) 



A property of the entanglement of purification [20] is that if a two-party state pac is completely 
supported either on the symmetric or antisymmetric subspace of AC then the entanglement of 
purification is simply the entropy of reduced state of one of the parties 10], 

Ep(pac) = S(p A ) = S(p c ). (63) 

Hence we prove the upper bound. 

The lower bound is a consequence of strong subadditivity of quantum entropy. We know from 
eq. <[63]) that 

A(p) = H(q) +2Y,q 1 S{p { { ) ) > H(q) + ^qjSQ,®). (64) 

j 3 

Note, however that 

H(g) +5>S(/# ) ) > s(j2<UPf) = S (PA)- (65) 

3 3 

Hence we have shown the lower bound and this concludes the proof. □ 



VI. EXAMPLES 



In this section we examine families of states which we can use to numerically illustrate the 
bounds on A(p). We look at two families of examples: first, on three qubits, 

Example 6 Consider the following family of three qubit states 

\^(x))abc -^{\¥x)a\$)bWx)c + \<P-x)a\1)b\<P-x)c), (66) 

where \<p x ) := vl — x 2 |0) + x\ 1), and x is a real parameter. Using the notation y = \/l — x 2 so that 
y 2 + x 2 = 1, we can calculate the following reduced density matrices for this pure state: 



(67) 
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PB ~2\{y 2 -x 2 ) 2 1 J' (68) 

Therefore we can calculate the entropy of each single party density matrix. 

S(p A ) = S(p c ) = -y 2 logy 2 -x 2 logx 2 = H 2 (x 2 ), (69) 

S(pb) = -(y 4 + x 4 ) log (y 4 + x 4 ) - 2x 2 y 2 log 2x 2 y 2 . (70) 

From theorem [5] we know that for totally symmetric or totally anti-symmetric states, S (pa) < 

A(p) < 25(pa)- Note also that for this state 7(4 : C|B)^ (a ,) = 5(AB) + S(BC) - S(B) = 2S(A) - 
S(B). Thus, we wish to understand the ratio 

S(PA) (71) 



2S{ PA ) - S(p B ) ' 



If we look at the leading order terms of the single party entropies, since < x < 1, we know that 
x 2 log x and x 2 are of lower order than x 4 , x 6 , etc. Thus, only taking x 2 log x and x 2 terms, 



5(pa) = -(1 - x 2 ) log (1 - x 2 ) - 2x 2 log x 



2x 2 In x 



x 



2 



+ r^ + 0(x 4 ), (72) 



In 2 In 2 



5(p B ) = -(1 - 2x l + 2x 4 ) log(l - 2x l + 2x 4 

- 2x 2 (l - x 2 )[l + 21ogx + log(l - x 2 )] 
4x 2 lnx . 2x 2 „ 2 4 



ln2 In 2 

Inserting these expressions, we find. 



+ ^--2x 2 + 0{x i ). (73) 



HP) > ,=-^ + 0(1) asx^O. (74) 



J(A : - 25(pa) - 50e»b) Inx 

Therefore for this state we can make this quantity approach +oo as the value of x decreases, 
marking a striking deviation from the classical case. 

Example 7 Another use of theorem [5] is for the pure states \((cI))abc on systems A and C of di- 
mension d and B of dimension d(d + l)/2: namely, |C(d)) is the purification of the completely 
mixed state on the symmetric subspace of AC, i.e. £ac = Trg (abc is proportional to the symmet- 
ric subspace projector, of rank d(d + l)/2. For this family of states, we have 

I(A:C\B) m = l + ]ogj±- l <l, (75) 
while according to our theorem, 

A(C(d)) >S(A) = logd. (76) 

This example shows that not only must any bound on A depend nonlinearly on I (A : C\B), but 
that a log-dimensional factor is also necessary. 

Example 8 Consider the class of states 

PABC = ^Pi|j>0"U ® WiYtfjls ® b')01c, (77) 
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characterised by an ensemble of pure states {pj, \ tpj)} on B - the states of A and C are meant 
to be mutually orthogonal states. For a POVM (M&) on B, and using the previous notation of 
B = Kb L b R , the optimal state is given by 

w Mabc = ® (v^lVv)(^lv^4W ® IfcX^k ® b'Xilc (78) 

We can calculate the following using formula (|4T]) using the fact the system is symmetric in sys- 
tems A and C and S(p) = S(A), 

D(p\\u[6, t]) = -S(A) + S(K) + S(Ab L \K) + (79) 

It is fairly clear from the formula since S(Ab L \K ) + S(Ab R \K) > 2S(A\K) that the optimal choice 
of b L b R is to make one trivial, the other B, so that 

D(p\\u;[6, r]) = -5(A) + S(tf) + 2S(A\K) = S(A\K) + S(K\A), (80) 

all entropies relative to the state u. Note that A and K are essentially classical registers, so that the 
above is really a classical probabilistic / entropic formula for the relative entropy. It is also quite 
amusing to see a quantity appearing that is known as information-distance in other contexts (see 
e.g. @I). 

VII. CONCLUSIONS 

We have investigated the relation between the quantum conditional mutual information of a 
three-party state, and its relative entropy distance from the set of all (short) quantum Markov 
chains. While the latter is always larger or equal than the former, with equality in the classical 
case, in general the relative entropy distance can be much larger than the conditional mutual 
information. We showed this by developing tools to lower bound the relative entropy distance, 
in particular for pure states of a special symmetric form. In the process we found many useful 
properties of the minimum relative entropy distance from Markov states. Our findings indicate 
that the characterisation of quantum Markov chains in terms of vanishing quantum conditional 
mutual information is not robust, or at least not at all like the classical case, or the (quantum 
and classical) case of ordinary mutual information. Since these lower bounds are additive for 
tensor products of states, this surprising and perhaps displeasing behaviour will not go away in 
an asymptotic limit of many copies of the state. 

What we haven't found is an upper bound of the relative entropy distance A in terms of the 
conditional mutual information I {A : C\B); our examples above show that such a bound has to 
depend nonlinearly on I and it has to contain a factor proportional to the logarithm of one or 
more of the local dimensions. Note that if there were a bound of the form A(p) < /(/) log(d,A.dc) 
- in particular not depending on the dimension of B -, then this would settle a question left open 
in namely, it would imply that the "squashed entanglement" E S(1 (pab) of a bipartite state pab 
is zero if and only if the state is separable. (We are grateful to Pawel Horodecki for pointing this 
out to us.) 

We close by pointing out that our results cast doubts on earlier ideas of two of the present 
authors (NL and AW), reported in (la] , on how to prove a non-standard inequality for the von 
Neumann entropy. The heuristics given there don't seem to bear out, in the light of the present 
paper; of course, the conjectured entropy inequality itself may well still be true. 
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